A Noise-Tolerating Gene Association Network Uncovering an Oncogenic Regulatory Motif in Lymphoma Transcriptomics

In cancer genomics research, gene expressions provide clues to gene regulations implicating patients’ risk of survival. Gene expressions, however, fluctuate due to noises arising internally and externally, making their use to infer gene associations, hence regulation mechanisms, problematic. Here, we develop a new regression approach to model gene association networks while considering uncertain biological noises. In a series of simulation experiments accounting for varying levels of biological noises, the new method was shown to be robust and perform better than conventional regression methods, as judged by a number of statistical measures on unbiasedness, consistency and accuracy. Application to infer gene associations in germinal-center B cells led to the discovery of a three-by-two regulatory motif gene expression and a three-gene prognostic signature for diffuse large B-cell lymphoma.


Introduction
The network modeling of biological systems can capture many of their essential characteristics [1,2]. In such modeling, biological processes are often depicted as a simple network graph where nodes represent molecules and edges that connect nodes represent their interactions or associations [3]. Although seemingly simplistic, mathematical and numerical simulations of prototype biological networks have served to provide insight into unknown structures or relationships of gene associations and regulations (e.g., [4]). A number of methods exploiting different algorithms have been developed to construct gene association networks (GANs), including graphical Gaussian models [5], Bayesian networks [6,7], and models of other approaches (see [8] for a comprehensive review).

Importance of Modeling Gene Association Networks with Biological Noises
One key issue for most of these GAN construction studies is that they assume gene expressions follow a known and well-defined probability distribution function, often a normal distribution function, i.e., a Gaussian probability function. This assumption may significantly depart from actuality, however, as gene expression is known to be influenced by non-Gaussian stochastic noises [9,10]. How the uncertainties in gene expressions and their noises are handled can have a significant impact on the resultant GAN and hence its predicted biological behaviors. Such uncertainties, called biological noises, can arise from, for example, stochastic oscillations in gene expressions [11], which can, in general, be categorized into either intrinsic or extrinsic [12]. Intrinsic noises may come from various sources, including individual events of transcription and translation, rates of biochemical reaction, or species concentrations [12,13], while extrinsic noises may be induced by external factors such as pathogens and other foreign compounds such as pharmaceuticals and Life 2023, 13, 1331 3 of 23 will drop the subscript I here. A regression model for the GAN of q simultaneous equations can then be expressed as [23]: y 1 = β 11 x 1 + β 12 x 2 + · · · + β 1p x p + ε y1 y 2 = β 21 x 1 + β 22 x 2 + · · · + β 2p x p + ε y2 . . . y q = β q1 x 1 + β q2 x 2 + · · · + β qp x p + ε yq (1) where the error terms, ε y1 , ε y2 , . . . , ε yq , as well as the expression levels of genes Xs and Ys, are random variables (i.e., non-constants).
We used the architecture of (1) for GAN construction mainly for two reasons. First, even though distribution-free modeling under the regression framework has been reported, we would like to develop a new approach with fewer assumptions. Second, unlike other approaches, such as Bayesian models, our approach could use a standard p-value cutoff of 0.05 to infer an association under the architecture of (1). This is attractive, especially when prior knowledge concerning the gene's regulation role in GAN is lacking.

Conventional Strategies for Estimating the Association Parameters
Generally speaking, Equation (1) can be a distribution-free regression model of GAN if we do not specify a probability density function for the expression and error terms, but in previous studies, including the work of [23], a normal distribution function was used to model gene expressions and measuring noises. Note also that although [25] had shown that large errors or outliers of expression data do not need to be modeled by a Gaussian distribution function in regression-based inferring of gene regulatory networks, their method nonetheless required all the errors and outliers to be modeled as symmetrically distributed residuals, which are unrealistic for real-world non-Gaussian noises. For the i-th observation (experiment), a regression model of the j-th equation of Equation (1) can be rewritten as follows: If we do not know the specific probability density functions for the observed expressions and error terms, there are, in general, three conventional LS-based strategies to estimate the association parameters of β in Equation (2). The first one is the ordinary LS estimation (LSE) strategy, for which the association can be detected by minimizing the following objective function Q j0 based on the sum of squares of the error terms ε yij 's [26] Q j0 β j1 , · · · , β jp = ∑ n i = 1 y ij − ∑ p k = 1 β jk x ik 2 (3) Assuming that these error terms are independently and identically distributed with zero mean and finite variance, LSE has some well-behaved statistical properties, including unbiasedness and minimal variance, as summarized by the Gauss-Markov theorem [26].
Generally speaking, RRE is used to combat multi-collinearity owing to the shrinkage of inflation estimation variances arising from highly correlated gene expression data, while LASSO is used to exclude zero coefficients in large-scale regression-based GAN prediction through the adjustment of shrinkage parameters in the penalty term. More on penalized LS strategies have been discussed by [24,29].

An Alternative Parameter Estimation Method
In addition to these LS-based methods (LSE, RRE, and LASSO), an alternative, distribution-free estimation in regression models is the method of grouping estimators. Wald [30] proposed a special kind of grouping estimator called the Wald-type estimator (WTE) to tackle measuring noises (variations arising from measuring processes) in simple linear regression. Wald's method divides the data into two groups according to predictor X: those above and below the observation median, respectively. The association parameters can then be estimated simply by computing the gradients of four means (those of the observed X and Y values, respectively, in the two divided groups). WTE has received little attention in the literature because it is inefficient as compared to LSE [31] and inconsistent with respect to measuring noises [32]. In addition, an assumption of independence between predictor variables is needed in multiple linear regression models, causing its poor performance in highly correlated data [17]. For more about WTE and methods of grouping, readers are referred to [31,33].
Recently, a generalized version of WTE called an adjusted Wald-type estimator (AWTE) has been developed to tackle Berkson-type uncertainties (i.e., noises in measurement but not errors caused by measuring process) and collinearity problems [17]. This nonparametric approach has several merits. First, for the multi-collinearity problem, AWTE is statistically consistent and asymptotically unbiased (overcoming the drawbacks of LSE, RRE and LASSO). Second, for the uncertainties in measurement error in conjunction with collinearities, whereas LSE may cause completely erroneous conclusions [34], AWTE can solve both problems simultaneously. It should be noted that, as Wu and Fang [17] pointed out, Berkson-type uncertainties are fundamentally different from the measuring noises discussed in [23,30] and are also different from the outliers treated in [25]: Namely, Berksontype uncertainties can arise from biological noises while the other types are products of measuring processes. The application of AWTE to GAN construction will be formally described later.

Modeling Biological Noises and Correlated Expressions
Contributions from extrinsic and intrinsic noises in biological processes and correlated expressions may lead to biased regression modeling and incorrect predictions for a GAN [15,16]. To avoid such biases and to recover true associations, we consider the effects of both intrinsic and extrinsic noises in the framework of a linear regression system.
Let us begin with the consideration of intrinsic noises for not only the target gene Y j but also the predictor gene X k in Equation (2). As pointed out by Fujita and coworkers [23], the error term, ε yij , in the regression model can be seen as intrinsic noise in the expression of the target gene, y ij . However, the intrinsic noises of predictor genes, defined as ε xik 's below, are irrespective of measuring devices, although they also appear in measurements [35]. In other words, biological noises, which are Berkson-type uncertainties, can affect both the true and the observed expressions of target genes, while measuring noises such as those Life 2023, 13, 1331 5 of 23 discussed in [23] affect only the observed expressions [36]. Therefore, if we would like to explicitly model intrinsic biological noise ε xik in predictor gene expression x ik , we can employ a Berkson-type uncertainty model [17] and rewrite Equation (2) as follows: Next, to model extrinsic noises, it is suggested by [37] that a total noise should be identified, which can be the sum of intrinsic and extrinsic noises, and that these two types of noises should be presented separately to distinguish the contributions of their different origins. Thus, to account for noises of both intrinsic and extrinsic origins simultaneously in the regression system, we can rewrite Equation (4) as follows: where the total noises in the expression of predictor gene X k and target gene Y j are ε xik + v xik and ε yij + v yij , respectively, in which v xik and v yij are extrinsic noises and ε xik and ε yij are intrinsic noises. Notice that the total noise for predictor gene X k may not influence target gene Y j if the association of these two genes is negligible, i.e., if the regression coefficient (β jk ) is very close to zero. In contrast, the total noise for target gene Y j can cause its expression to fluctuate significantly whether or not the interactions between predictor and target genes are negligible. As a result, combining all noises into a single term in the modeling is problematic if the complexities of uncertainties are overly simplified. Finally, to deal with the potential presence of collinearity, i.e., highly correlated gene expression data, we can assume that a predictor gene X l (l < p) is linearly dependent on another predictor gene X p (see [38] for a similar assumption about linear dependence between two genes); that is, . . .
Equation (6) intuitively divides gene expression x il (l < p) into two additive sources: the former source, u il , is a unique component for the predictor gene itself (i.e., independent of other genes) and the later source, x ip , is a common interaction component among p predictor genes and r l is their correlation parameter. Note that the framework of Equation (6) allows for ease of interpreting the structure of correlated observations and has been commonly used in the literature to address collinear configuration in regression analysis [17,39].
In summary, if highly correlated expression data and intracellular molecular noises are significant, WTE can be unstable, and conventional regression strategies (LSE, RRE, or LASSO) for deducing the values of association parameters β's may be greatly biased. This is because specifying the exact means and variances of the total noise contributed from manifold origins is difficult, and as a result, the assumptions of the Gauss-Markov theorem do not hold. Furthermore, it is possible to over-adjust the penalty terms in Q j1 (for LASSO) or Q j2 (for RRE) for ill-conditioned problems due to the requirement of information on regression coefficients when estimating weight parameters.

A Robust Distribution-Free Regression Method for Modeling GAN Using AWTE
To consider the effects of biological noises on inferring a GAN, we can rewrite Equation (1) for the i-th independent experiment according to Equation (5) . . .
In addition, to deal with the influence of highly correlated data in regression models, the regressors can be constrained on Equation (6). To account for the complexity that may arise from stochastic noises of manifold origins, such as those described as non-Gaussian noises, we employed AWTE to obtain the association parameters β's in Equation (7). The whole analytical procedure of the proposed distribution-free method, also referred to as AWTE, can be summarized by three primary steps.
Step 1. Determine the common interaction component (i.e., the second source of the additive combination, x ip ) in Equation (6) among p predictor genes according to their observed expression levels; it can be made by where ρ is the Pearson correlation coefficient and x k is the expression of gene X k .
Step 2. Estimate all the correlation parameters r 1 , r 2 , . . . , r p−1 in Equation (6), which can be achieved byr where l = 1, . . . , p−1, I denotes the indicator function, i.e., I[A] = 1 if A is true, and 0 otherwise, x k is an n × 1 vector with its i-th value equal to x ik for all k ≤ p, and M(x k ) is the median of all values in vector x k (i.e., the median of x 1k , x 2k , . . . , x nk ).
Step 3. Obtain the association parameter β jk in Equation (7) under the constraint of Equation (6), by usinĝ Equation (8) is the so-called AWTE, wherer p is zero and τ jk is given by Note that, if we take all the values ofr k to be zero, Equation (8) reduces to WTE. A few remarks need to be made regarding the present approach. First, it has been pointed out by [17] that AWTE (Equation (8)) is a two-stage estimation method: estimating the whole set of regression coefficients except for the case of k = p first, then all the other regression coefficients by calculating τ jp 's. In this way, AWTE can be calculated directly without using iterative or optimization algorithms. Second, the computational cost of AWTE is O(p 2 ) if q = 1 [17], and hence that of Equation (8) under Equation (6) is O(max[p 2 , pq]). In addition, as demonstrated in [17], by using this approach, we have theoretical guarantees for the robustness of the predicted GAN (see Appendix A for the theorem of robustness and its proof).

Results
In order to characterize the influences of expression noises on the performance of conventional LS-based regression methods (i.e., LSE, RRE and LASSO) and the present method (AWTE), we conducted a series of numerical simulations, in which levels of noises and sample sizes were varied to investigate the robustness of the networks constructed by these different methods.

Numerical Simulation Settings
Three common and standardized measures, power of detection (PD), false discovery rate (FDR) and inferential errors (INER), as suggested and defined in [6], were employed. Briefly summarizing, let B be the p × q parameter matrix of β in Equation (7). The three standardized measures are defined as follows: PD is the proportion of true associations (edges) in B detected; FDR is the proportion of predicted associations (edges) in B that are false detections; INER is the sum of all the deviations between estimated and true regression coefficients in B, where an edge between genes Y j and X k is regarded as detected if the absolute value of the estimate of association parameter β jk is greater than a cut-off value τ. We refer to [6] for detailed descriptions of these measures.
In these simulation experiments, we considered a system of ten predictor genes and five target genes (i.e., p = 10, q = 5) and a large set of observation samples, n ≥ 400. An association parameter β jk will be assigned a non-zero value chosen randomly from a uniformly distributed interval of [−1, −0.5] and [0.5, 1] with probability π, or set to zero otherwise with probability 1−π, where π can be regarded as the proportion of network edges that connect between X and Y genes. To avoid the situation in which all association parameters are zero, we set the regression coefficients of common interaction components, i.e., β jp 's for all j ≤ q in Equation (6), to be 0.9999. In addition, we assumed that in the same equation u il (l < p) and x ip in the additive combination of gene expression x il follow a chi-square distribution with a degree of freedom 2, and that random noises ε xik 's follow a chi-square distribution with a degree of freedom σ 2 (σ being the level of noise) and random error terms ε yij 's follow a normal distribution with zero mean and 2σ 2 variance. Note that the assumption of normal distribution for ε yij 's is commonly employed in other studies [6,23,40], and if we had used a non-Gaussian distribution for them, we would have obtained even larger errors for the conventional methods. Note also that although we used chi-square distributed intrinsic noises for predictor genes (ε xik 's) to synthesize gene expressions, the use of other types of non-Gaussian noises would not affect the conclusions of this study, because the robustness of our method has theoretical proof for noises of unknown probability density functions (see Appendix A) and this was buttressed in simulations using two other types of non-Gaussian noises (Appendix B Figure A2).
For simplicity, the common interaction component among p predictor genes was assumed to be known because it can be identified from Step 1 of the AWTE procedure. To mimic the influences of the many sorts of extrinsic noises, we furthermore assumed that the extrinsic noises v yij and v xik in Equation (7) were replaced by non-linear functions with v yij 's and v xik 's sharing the same probability density function of ε xik . Note that in our method, the distributions of observed expressions and noises, hence the non-linear functions f Y and f X , need not be known, but they need to be specified in a certain form in order to generate the synthetic data needed for the simulations.

Method Comparisons in Numerical Simulations
Based on the settings and the frameworks described in Section 3.1 and Equations (6) and (7) in Section 2, we numerically generated gene expression data and used them for a series of 1000 repeated simulation runs with prescribed parameter values representing different levels of noises and sample sizes. From these simulations, measures of PD (power of detection), FDR (false discovery rate) and INER (inferential error) were computed in the receiver operating characteristic (ROC) curve analysis in which an optimal cutoff point for best performance was determined. We evaluated the resulting predicted GANs using these standard statistical measures in Figures 1-3 where r l = 1.5 (i.e., Pearson correlation coefficient between genes X l and X p was set to be greater than 0.8, which indicates a condition of high correlation; see Equation (6)) and π = 0.4 (the ratio of network connectivities that are truly associated; see Section 3.1) were fixed to examine the effects of different levels of noises (indicated by σ 2 ) and sample sizes. These allowed us to evaluate the performance and robustness of different methods under conditions of high collinearities.
As expected and shown by Figures 1 and 2, higher levels of noise led to larger values of INER and FDR for all the four methods investigated. However, compared to three conventional regression methods, LASSO [27], RRE [28] and LSE [26], our method (AWTE) was significantly less sensitive to increasing levels of noises for both INER ( Figure 1) and FDR ( Figure 2), especially as the sample size increased. Since PD, the proportion of correctly inferred network edges for all non-zero association parameters (β), can be high even when FDR is also high, the two need to be evaluated together. A combined measure, the square root of FDR 2 + (1 − PD) 2 , was therefore used, and the results are shown in Figure 3. As can be seen, when the sample size was small, AWTE performed somewhat worse than the three conventional methods when the noise level was low, but as the sample size increased, AWTE gradually gained an advantage and then significantly outperformed the conventional methods when the noise level was high. Notably, the GAN was inferred within 20 s on a general-purpose PC equipped with Intel CPU i7-4790 and 8 GB of RAM for a total of 1000 simulation runs using AWTE; similar timings were obtained with RRE and LSE, but LASSO took over 300 min to complete the same task. computed in the receiver operating characteristic (ROC) curve analysis in which an optimal cutoff point for best performance was determined. We evaluated the resulting predicted GANs using these standard statistical measures in Figure 1, Figure 2 and Figure  3 where rl = 1.5 (i.e., Pearson correlation coefficient between genes Xl and Xp was set to be greater than 0.8, which indicates a condition of high correlation; see Equation (6)) and π = 0.4 (the ratio of network connectivities that are truly associated; see Section 3.1) were fixed to examine the effects of different levels of noises (indicated by σ 2 ) and sample sizes. These allowed us to evaluate the performance and robustness of different methods under conditions of high collinearities. As expected and shown by Figure 1 and Figure 2, higher levels of noise led to larger values of INER and FDR for all the four methods investigated. However, compared to three conventional regression methods, LASSO [27], RRE [28] and LSE [26], our method (AWTE) was significantly less sensitive to increasing levels of noises for both INER ( Figure  1) and FDR (Figure 2), especially as the sample size increased. Since PD, the proportion of correctly inferred network edges for all non-zero association parameters (β), can be high even when FDR is also high, the two need to be evaluated together. A combined measure, the square root of FDR 2 + (1 − PD) 2 , was therefore used, and the results are shown in Figure  3. As can be seen, when the sample size was small, AWTE performed somewhat worse than the three conventional methods when the noise level was low, but as the sample size increased, AWTE gradually gained an advantage and then significantly outperformed the conventional methods when the noise level was high. Notably, the GAN was inferred within 20 s on a general-purpose PC equipped with Intel CPU i7-4790 and 8 GB of RAM for a total of 1000 simulation runs using AWTE; similar timings were obtained with RRE and LSE, but LASSO took over 300 min to complete the same task.     To gain further insight into the theoretical behaviors of the proposed approach, simulations using a broader range of sample sizes and different scales of simultaneous expression equations (q = 5, 10, 100, 1000) were conducted and analyzed. A typical result is shown in Figure 4 (also see Appendix B Figure A2), where a low level of noise (σ 2 = 1) was applied, and the number of predictor genes, X, was set to be 10 (p = 10). As can be seen  Figure 4A, estimation deviations, as indicated by INER, decreased as the sample size increased. Furthermore, as shown in Figure 4B, although INER increased with an increased number (q) of target genes ( Figure 4A), the average estimation deviation of β, i.e., INER/pq, remained nearly constant at any given sample size for all the numbers of target genes tested and decreased as the sample size increased. These results confirm the Theorem A1 (see Appendix A) and demonstrate the robustness of AWTE. These results also indicate that a desirable outcome (e.g., with PD > 0.95 and FDR < 0.05) can be expected using AWTE when a sufficiently large number of observations (e.g., sample size n > 3000) are available. Importantly, all these statistical behaviors remained true when a standard p-value cutoff (0.05) was used to predict an association (Appendix B Figure A3), although the results for PD were somewhat worse than those shown in Figure 4 where a threshold value for the association parameters was selected to produce the best ROC performance. This is an important point to make because it demonstrates the potential of AWTE to reliably predict gene associations from gene expression data in the absence of literature knowledge on those associations. shown in Figure 4 (also see Appendix B Figure A2), where a low level of noise (σ 2 = 1) was applied, and the number of predictor genes, X, was set to be 10 (p = 10). As can be seen from Figure 4A, estimation deviations, as indicated by INER, decreased as the sample size increased. Furthermore, as shown in Figure 4B, although INER increased with an increased number (q) of target genes ( Figure 4A), the average estimation deviation of β, i.e., INER/pq, remained nearly constant at any given sample size for all the numbers of target genes tested and decreased as the sample size increased. These results confirm the Theorem A1 (see Appendix A) and demonstrate the robustness of AWTE. These results also indicate that a desirable outcome (e.g., with PD > 0.95 and FDR < 0.05) can be expected using AWTE when a sufficiently large number of observations (e.g., sample size n > 3000) are available. Importantly, all these statistical behaviors remained true when a standard p-value cutoff (0.05) was used to predict an association (Appendix B Figure A3), although the results for PD were somewhat worse than those shown in Figure 4 where a threshold value for the association parameters was selected to produce the best ROC performance. This is an important point to make because it demonstrates the potential of AWTE to reliably predict gene associations from gene expression data in the absence of literature knowledge on those associations.

Method Comparisons Using an Actual Lymphoma Dataset
To evaluate the potential of the proposed method for practical applications, we tested it on a known network of TF (transcription factor) genes and the genes they regulate in the germinal-center regulatory program of B cells. As reviewed in [41], dysregulation of this network is a cause of many types of lymphomas. In this case, a study using actual gene expression data of lymphoma, we would like to find out to what extent the experimentally documented associations of this GAN of germinal center B-cell genes can be predicted by AWTE, in comparison to conventional methods.
We retrieved the gene expression data of diffuse large B-cell lymphoma (DLBCL) contributed by [42] from Gene Expression Omnibus (GEO, [43]). This dataset (GEO: GSE60) contained data for five (BCL6, BACH2, SPIB, IRF8, and OCT2) of the seven TF genes (modeled as predictor genes) and all the ten target genes (p21, MYC, P53, BCL2, NFKB1, IRF4, Blimp1, AID, p27, and ATR) of the germinal center regulation network described in Figure 1 of [41]. The data of the total sample size (N = 133) from GSE60 for both normal cells (N = 31) and tumor cells (N = 102) containing both GCB (germinal center B-cell) and ABC (activated B-cell) subtypes were retrieved and analyzed.
The AWTE-produced association parameters for the network (a 5 × 10 matrix) of this dataset are presented in Table A1 (Appendix B), and the ROC performances for AWTE, LASSO, RRE, and LSE are shown in Figure 5. The ROC performances were determined by treating the experimentally observed associations (dash-boxed in Table A1) as real and all the rest as nonexistent-ignoring the fact that absence of observation does not necessarily equate to the absence of association. As can be seen in Figure 5, AWTE, having the largest area under the ROC curve (AUC), significantly outperformed the three conventional methods, which did not perform better than random guesses (the diagonal line in the ROC plot). Interestingly, of the several associations predicted using AWTE with statistical significance (p-value < 0.05; bold-typed in Table A2) but have not yet been verified by human data, three (asterisked in Table A1) can find support from mouse studies: SPIB-AID [44], BACH2-AID [45], and IRF8-IRF4 [46]. In summary, a gene trio motif of gene regulation could be clearly identified in Table A1 based on the statistical significance of the gene associations deduced: namely, SPIB, BACH2, and OCT2 are regulators of oncogenes IRF4 and AID.

Method Comparisons Using an Actual Lymphoma Dataset
To evaluate the potential of the proposed method for practical applications, we tested it on a known network of TF (transcription factor) genes and the genes they regulate in the germinal-center regulatory program of B cells. As reviewed in [41], dysregulation of this network is a cause of many types of lymphomas. In this case, a study using actual gene expression data of lymphoma, we would like to find out to what extent the experimentally documented associations of this GAN of germinal center B-cell genes can be predicted by AWTE, in comparison to conventional methods.
We retrieved the gene expression data of diffuse large B-cell lymphoma (DLBCL) contributed by [42] from Gene Expression Omnibus (GEO, [43]). This dataset (GEO: GSE60) contained data for five (BCL6, BACH2, SPIB, IRF8, and OCT2) of the seven TF genes (modeled as predictor genes) and all the ten target genes (p21, MYC, P53, BCL2, NFKB1, IRF4, Blimp1, AID, p27, and ATR) of the germinal center regulation network described in Figure  1 of [41]. The data of the total sample size (N = 133) from GSE60 for both normal cells (N = 31) and tumor cells (N = 102) containing both GCB (germinal center B-cell) and ABC (activated B-cell) subtypes were retrieved and analyzed.
The AWTE-produced association parameters for the network (a 5 × 10 matrix) of this dataset are presented in Table A1 (Appendix B), and the ROC performances for AWTE, LASSO, RRE, and LSE are shown in Figure 5. The ROC performances were determined by treating the experimentally observed associations (dash-boxed in Table A1) as real and all the rest as nonexistent-ignoring the fact that absence of observation does not necessarily equate to the absence of association. As can be seen in Figure 5, AWTE, having the largest area under the ROC curve (AUC), significantly outperformed the three conventional methods, which did not perform better than random guesses (the diagonal line in the ROC plot). Interestingly, of the several associations predicted using AWTE with statistical significance (p-value < 0.05; bold-typed in Table A2) but have not yet been verified by human data, three (asterisked in Table A1) can find support from mouse studies: SPIB-AID [44], BACH2-AID [45], and IRF8-IRF4 [46]. In summary, a gene trio motif of gene regulation could be clearly identified in Table A1 based on the statistical significance of the gene associations deduced: namely, SPIB, BACH2, and OCT2 are regulators of oncogenes IRF4 and AID.  The results obtained with models of three conventional regression methods were presented in Tables A3-A5. Generally speaking, the patterns of their GANs were quite similar to ours, but notable differences existed. For example, both IRF4 and AID were downregulated by BACH2 in the AWTE-inferred GAN, but only AID was in the LS-inferred GAN. In addition, whereas the down-regulation of IRF4 by BCL6 was evident in both GANs, the up-regulation of AID by BCL6 was significant only in the LS-inferred GAN. In combination, these results appear to reflect the experimental observation of the tumor suppressor role of BACH2 [47] and the dual regulation role of BCL6 [41].

Discussion
The LS strategy, which is mathematically equivalent to the maximum likelihood estimation, is known to perform well for systems with Gaussian noises, i.e., noises that are characterized by normal distributions [26]. However, when noises are non-Gaussian, LS-based methods can be unsatisfactory. For example, as can be seen in Figure 1, when the sample size is as large as 3200, conventional methods can be unstable under conditions of non-Gaussian noises, with LSE and RRE having a wide range of INER even when the level of noise is not high (e.g., σ 2 = 1). In addition, as can be seen from Figure 3 at sample size = 3200, LASSO performed better (lower median in the box plot) than LSE and RRE in the case of σ 2 > 1 but worse in the case of σ 2 ≤ 1, which suggests that in this example LASSO failed the test of robustness. Taken together, we can conclude that, as did others [15,48], a predicted gene network might be non-functional (e.g., with high INER and high FDR values) or even incorrect if the effects of intrinsic or extrinsic noises are ignored or overly simplified to reduce analysis complexities. Indeed, the expressions of eight of the fifteen genes analyzed in Table A1 for lymphoma did not pass the normality test (Appendix B Figure A4), which could be a reason for the poor ROC results of the LS-based methods for predicting the B-cell GAN ( Figure 5).
To assess the potential use of the gene trio regulation motif for practical applications, we conducted a subtype analysis in DLBCL GCB and ABC using data from GSE60. As may be seen in Figure A5A, the regulations of the motif, as suggested in GCB patients' gene expression data, are consistent with the overall trend of our model (Table A1). However, in Figure A5B), the down-regulating function of BACH2 is nearly non-existent, while the up-regulating function of SPIB and OCT2 to the two oncogenes in the ABC subtype is stronger compared to the GCB subtype. Although we do not know specific mechanisms of how these TF genes can help differentiate the subtypes, these observations may suggest that over-expression of SPIB and OCT2, as well as malfunction of BACH2, could be probable causes leading to higher IRF4/AID expressions and resulting in different clinical outcomes for patients with different subtypes.
In the present work, we did not consider measuring noises because their modeling may require additional experimental data and/or analysis procedures, as well as a distributiondependent approach (e.g., [23]). It is a problem not within the scope of the present study but will be a subject of our future research.
There are a few other limitations of our method in its current form. Firstly, although a wider range of design specifications can be used to construct GANs because AWTE can model uncertain noises with fewer constraints, our method may not perform as well as conventional LS-based methods if the number of observations is not sufficiently large, as Figures 1-3 indicate. However, array-based experiments and other high-throughput technologies to produce very large expression datasets have become increasingly available in recent years, as in studies using TCPA (The Cancer Proteome Atlas, [49]; sample size > 3000), TCGA (The Cancer Genome Atlas, [50]; sample size > 800) or UK biobank ( [51]; sample size of 500,000 around), this limitation of sample size may soon become a non-issue in many applications.
Secondly, if in the model the number of predictor genes, p, is larger than that of experiments, n, overfitting may occur, which is a major statistical limitation of linear regression analysis [24]. To circumvent this problem, automatic variable selection techniques (e.g., Life 2023, 13, 1331 13 of 23 stepwise, forward, or backward selections) can be potentially helpful to screen for favorite predictor genes so as to consider only a smaller number of them (i.e., n > p) in applying the proposed approach. Or, as demonstrated by the case study of lymphoma in the present work, knowledge and information from the literature, despite being far from complete and often not straightforward, can be harnessed for the new method to make insightful discoveries on gene regulations.
Thirdly, we did not consider time series data mainly because regression modeling for time series observations often requires distribution-dependent procedures (see, e.g., [52]) or a distribution-free procedure as in the work of [53], for which, however, theoretical justifications are still lacking to prove that a generalized LS-based approach can address well the manifold uncertainties associated with the predictors of interest. Further studies are required to fully address this statistical issue. Fourthly, our model was derived from data from an older array platform, which may cause biases in the analysis and hence reduce the accuracy of the results. The predictive value of the gene trio motif has also not been fully investigated, although in a preliminary analysis we found that the trio can be a prognostic signature to distinguish survival risks of lymphoma patients ( Figure A6). Further validation with newer data of the model and the gene trio motif in cancer gene regulation is ongoing.
Finally, our method in the present work was applied to only a handful of variables (genes). In principle, one could consider all TFs as predictor genes to regulate all other genes and build a whole-genome TF-centered GAN. However, it remains to be investigated if the existing data are sufficient to overcome overfitting for such an undertaking. A strategy such as principal component analysis to shrink the dimension of these TFs while keeping all the data in the analysis may be necessary.  x k (i.e., v x1k , v x2k , . . . , v xnk ), in regression model (7), for all positive integers j and k, are all independently and identically sampled random variables. Furthermore, their fourth moments are finite; that is, where g yj and g xk are unknown probability density functions of the intrinsic noises of y j and x k , respectively, and h yj and h xk are unknown probability density functions of the extrinsic noises of y j and x k , respectively.
(D2) The expressions of unique component for predictor gene X l (i.e., u 1l , u 2l , . . . , u nl for l < p) and those of common interaction component (i.e., x 1p , x 2p , . . . , x np ) in Equation (6) are all independently and identically sampled random variables. Furthermore, their fourth moments are finite; that is, where f ul and f xp are unknown probability density functions of u il and x ip , respectively.
(D3) The observed gene expressions x 1p , x 2p , . . . , x np and the expressions of unique component u 1l , u 2l , . . . , u nl for predictor gene X l in Equation (6), for all positive integers l < p, are independent of intrinsic and extrinsic noises. (D4) The expressions of unique component u 1t , u 2t , . . . , u nt for predictor gene X t are independent of those of all the other unique component u 1l , u 2l , . . . , u nl and common interaction component x 1p , x 2p , . . . , x np in Equation (6), for all t not equal to l.
for all real number c except for the case of c = 1 and k = p.
Proof of Theorem A1. Below, for the convenience of describing the mathematical proofs, we shall define a few nomenclatures. Let |.| denote the absolute value of a real number (or a random variable) and p(A) denote the probability of an event (or a set) A.
First, we will prove the statistical consistency of Equation (8); that is, for an arbitrary positive number ε, we need to show Let R Ut , R EXt , R VXt , R EYj and R VYj be defined as follows: Observe that where r * s is the maximal value of |r s | for all sample size n. According to the assumption of independent and identical sampling made in design specifications D1-D4, it immediately follows from Lemma-(i) and the similar arguments of (A.2) and (A.3) in [17] that, for arbitrary positive numbers λ m , m = 1, 2, . . . , 8, and that lim and that Hence, based on (A2)-(A5), and let ε * = ε/(3pq − q), we have which concludes that (A1) is true. Thus, we have completed the first part of proof of the Theorem A1, i.e., Equation (8) is a statistically consistent estimation for all the regression coefficients in Equation (6).
As for the second part of the proof, we will prove approximate unbiasedness of Equation (8); that is, we need to show where E(X) denotes the expectation of a random variable X. Observe that If we can show that and that the proof of the Theorem A1 will be complete, i.e., (A6) is true, because, based on (A7)-(A9), The remaining part of the proof now is to show that (A8) and (A9) hold. We will first show that (A9) is true. According to the assumption of independent and identical sampling in design specifications D1, D3, and D5, and to Lemma-(iv) and the analogous arguments of proof of (A.6) in [17], there must exist a random variable D VY , which is greater than |R VYj | such that the expectation of D VY is finite (smaller than infinity). Following (A4) and the theorem of dominated convergence (see, for example, [54]), we have which concludes that (A9) holds. Analogously, we come to the conclusion that (A8) also holds via repeating similar arguments of proof for (A9).
As above, we have completed the proof of Theorem A1.

Appendix B
Life   Figure A2. The performance of AWTE as a function of target gene number q, using different types of non-Gaussian noise. In each performance evaluation, 1000 repeated simulation runs were conducted using 10 predictor genes (p = 10) and a small noise level of σ 2 = 1, and non-Gaussian noises generated by (A) gamma, and (B) log-normal, probability distribution function. Figure A3. The performance of AWTE as a function of sample size and varying number (q) of target genes, with associations inferred by statistical test of significance. In this evaluation, 1000 repeated simulation runs were conducted using 10 predictor genes (p = 10) and a small level of noise σ 2 = 1, and a standard p-value cutoff (0.05) was used to infer an association. Because computing all the p-values is time consuming, the simulations were conducted only for q ≤ 20.  (Table A1) examined among the 133 samples of the GSE60 lymphoma dataset. Histograms in pink are the genes exhibiting a statistically significant deviation from normal distribution (p-value < 0.05), and in blue are those not. Standard normal distribution is drawn by red curve in each histogram. of non-Gaussian noise. In each performance evaluation, 1000 repeated simulation runs were conducted using 10 predictor genes (p = 10) and a small noise level of σ 2 = 1, and non-Gaussian noises generated by (A) gamma, and (B) log-normal, probability distribution function. Figure A3. The performance of AWTE as a function of sample size and varying number (q) of target genes, with associations inferred by statistical test of significance. In this evaluation, 1000 repeated simulation runs were conducted using 10 predictor genes (p = 10) and a small level of noise σ 2 = 1, and a standard p-value cutoff (0.05) was used to infer an association. Because computing all the p-values is time consuming, the simulations were conducted only for q ≤ 20.           Figure 1 of [41] are: SPIB-IRF4 [55], BACH2-BCL2 [47], OCT2-MYC [56], OCT2-BCL2 [57], OCT2-NFKB1 [56], OCT2-IRF4 [56], OCT2-Blimp1 [58], and OCT2-AID [59]. Those inferred by ROC curve analysis to yield the best AUC ( Figure 5; 0.26 being the cut-off) are underlined. Those with t-test under the null hypothesis β = 0 being significant (p-value < 0.05) are boldfaced (see Table A2 below for the p-values). The three asterisked associations have support from mouse studies: SPIB-AID [44], BACH2-AID [45], and IRF8-IRF4 [46].